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Gene expression is significantly stochastic making modeling of 
genetic networks challenging. We present an approximation that 
allows the calculation of not only the mean and variance but 
also the distribution of protein numbers. We assume that pro- 
teins decay substantially slower than their mRNA and confirm 
that many genes satisfy this relation using high-throughput data 
from budding yeast. For a two-stage model of gene expression, 
with transcription and translation as first-order reactions, we cal- 
culate the protein distribution for all times greater than several 
mRNA lifetimes and thus qualitatively predict the distribution of 
times for protein levels to first cross an arbitrary threshold. If 
in addition the promoter fluctuates between inactive and active 
states, we can find the steady-state protein distribution, which 
can be bimodal if promoter fluctuations are slow. We show that 
our assumptions imply that protein synthesis occurs in geometri- 
cally distributed bursts and allows mRNA to be eliminated from 
a master equation description. In general, we find that protein 
distributions are asymmetric and may be poorly characterized by 
their mean and variance. Through maximum likelihood methods, 
our expressions should therefore allow more quantitative compar- 
isons with experimental data. More generally, we introduce a 
technique to derive a simpler, effective dynamics for a stochastic 
system by eliminating a fast variable. 
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Gene expression in both prokaryotes and eukaryotes is 
inherently stochastic [1, 2, 3, 4]. This stochasticity is 
both controlled and exploited by cells, and, as such, must be 
included in models of genetic networks [5, 6]. Here we will 
focus on describing intrinsic fluctuations, those generated 
by the random timing of individual chemical reactions, but 
extrinsic fluctuations are equally important and arise from 
the interactions of the system of interest with other stochas- 
tic systems in the cell or its environment [7, 8]. Typically, 
experimental data are compared with predictions of mean 
behaviors and sometimes with the predicted standard de- 
viation around this mean because protein distributions are 
often difficult to derive analytically, even for models with 
only intrinsic fluctuations. 

We will propose a general, although approximate, 
method for solving the master equation for models of gene 
expression. Our approach exploits the difference in life- 
times of mRNA and protein and is valid when the protein 
lifetime is greater than the mRNA lifetime. Typically, pro- 
teins exist for at least several mRNA lifetimes, and protein 
fluctuations are determined by only time-averaged proper- 
ties of mRNA fluctuations. Following others [7, 9, 10, 11], 
we will use this time-averaging to simplify the mathemati- 
cal description of stochastic gene expression. 

For many organisms, single cell experiments have shown 
that gene expression can be described by a three-stage 
model [3, 4, 12, 13, 14]. The promoter of the gene of interest 
can transition between two states [10, 15, 16, 17], one active 
and one inactive. Such transitions could be from changes 
in chromatin structure, from binding and unbinding of pro- 
teins involved in transcription [3, 4, 12], or from pausing by 
RNA polymerase [18]. Transcription can only occur if the 



promoter region is active. Both transcription and transla- 
tion, as well as the degradation of mRNAs and proteins, 
are usually modelled as flrst-order chemical reactions [5] . 

By taking the limit of a large ratio of protein to mRNA 
lifetimes, we will study the three-stage model and a simpler 
two-stage version where the promoter is always active. For 
this two-stage model, we will derive the protein distribu- 
tion as a function of time. We will derive the steady-state 
protein distribution for the full, three-stage model. We also 
include expressions for the corresponding mRNA distribu- 
tions [14, 16] in the Supporting information. 

A two-stage model of gene expression. We will flrst con- 
sider the model of gene expression in Fig. la [9]. This 
model assumes the promoter is always active and so has 
two stochastic variables: the number of mRNAs and the 
number of proteins. The probability of having m mRNAs 
and n proteins at time t satisfies a master equation: 
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with vo being the probability per unit time of transcrip- 
tion, vi being the probability per unit time of translation, 
do being the probability per unit time of degradation of an 
mRNA, and di being the probability per unit time of degra- 
dation of a protein. By defining the generating function, 
F{z,z), by F{z\z) = S^,^ ^''"^''^m,n, we can convert 
Eq. 1 into a first-order partial differential equation: 
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where we have rescaled [19], with a = vo/di, b — vi/do, 
^ = do/di, and r = dit, and where u = z —1 and v = z — 1. 

If the protein lifetime is much greater than the mRNA 
lifetime and 7^1, Eq. 2 can be solved using the method 
of characteristics. Let r measure the distance along a char- 
acteristic which starts at r = with u — uq and v — vq for 
some constant uq and t'o, then Eq. 2 is equivalent to [20] 
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Consequently direct integration implies r = v 
7^1, u{v) obeys (Supporting information) 
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d^s V — vqc'^ > vo for r > 0. When 7 ^ 1, rapidly tends 
to a fixed function of v: for most of a protein's lifetime, 
the dynamics of mRNA is at steady-state. The generating 
function then obeys 
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Intuitively, Eq. 6 arises from Eq. 2 because large 7 causes 
the term in square brackets in Eq. 2 to tend to zero to 
keep F{u,v) finite and well defined. Eq. 6 describes only 
the distribution for protein numbers: F{u, v) is just a func- 
tion of V. Terms of higher order in 7"^ will depend on u. 
Large 7 implies that most of the mass of the joint probabil- 
ity distribution of mRNA and protein is peaked at m = 0: 

Pm,n — Po,n- 

We can find the probability distribution for protein 
numbers as a function of time by integrating Eq. 6. In- 
tegration gives 



F{z,r) = 
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1 + b-bz 



[7] 



assuming that no proteins exist at r = 0. From the defi- 
nition of a generating function, expanding F{z) in z gives 
(Supporting information) 
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where Pn{r) = Po,n(T). Here 2Fi{a,b,c; z) is a hyperge- 
ometric function and F denotes the gamma function [21]. 
Eq. 8 is vahd when 7>1, T>7~^to allow the mRNA 
distribution to reach steady- state, and a and b are fi- 
nite. The mean, (n) = a6(l — e~'^), and the variance, 
(n^) — (n)^ = (n)(l + 6 + 6e~'^), of Eq. 8 agree with earlier 
results [9] . At steady-state r ^ 1 and 
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which is a negative binomial distribution. We verified Eq. 
8 and Eq. 9 with stochastic simulations using the Gibson- 
Bruck version [22] of the Gillespie algorithm [23] and the 
Facile network compiler and stochastic simulator [24]. If 
7^1, Eq. 9 accurately predicts the distribution described 
by Eq. 1 (Fig. lb and Ic), but it fails as expected for smaller 
7. This effect can be quantified by calculating the KuUback- 
Leibler divergence between the predicted and simulated dis- 
tributions for different 7 (Fig. Id). Eq. 8 is illustrated in 
Fig. 2a and 2b. As well as 7 ^ 1, times with r > 7"^ are 
necessary for negligible Kullback-Leibler divergences (Fig. 
2c). 

Eq. 8 allows complete characterization of the Markov 
process underlying the two-stage model. The 'propagator' 



probability, Pn\k{^)^ which is the probability of having n 
proteins at time r given k proteins initially, satisfies (Sup- 
porting information) 

Pn\k{T) = J2 (''^P^-rir) (1 - e-^)'-''e-'-^ [10] 

where Pn(r) = if n < 0. With Eqs. 8 and 10, two- 
stage gene expression is in principle completely character- 
ized for 7 ^ 1 and r ^ 7"^. For example, we can cal- 
culate how the noise in protein numbers, 77 (their standard 
deviation divided by their mean), changes with time. If 

protein numbers initially have a distribution then at 

a time r their distribution will be Pn\k('T)Pk^'' ■ The 
noise of this distribution can either increase, decrease, or 
behave non-monotonically as time increases (Fig. 2d). We 
can also calculate non-steady state auto-correlation func- 
tions and first-passage time distributions for protein levels 
to first cross a threshold, N (with some standard numer- 
ics). In general, such distributions are only qualitative be- 
cause contributions from times with r < 7"^ are always 
relevant. Accuracy can be improved by having 7 ^ 10 
and a sufficiently high threshold (Fig. 2e and Supporting 
information) . 

We can derive Eq. 9 more intuitively. An mRNA un- 
dergoes a competition between translation and degradation 
because ribosomes and degradosomes bind to it mutually 
exclusively [25]. For each competition, the probability of 
a ribosome binding to the mRNA is ^^^^^ = j^- If we 
assume that proteins have longer lifetimes than mRNAs 
(7^1), then each protein synthesized from a given mRNA 
will not on average be degraded before the mRNA is de- 
graded. On protein timescales, all the proteins synthesized 
from an mRNA will appear to be synthesized simultane- 
ously (Fig. 5 in Supporting information). Consequently, 
the probability of r new proteins being produced by the 
synthesis and degradation of one mRNA is equal to the 
probability of an mRNA being translated r times. This 
probability is [25] 

which is a geometric, or 'burst', distribution. Alternatively, 
we can consider the lifetime t' of each mRNA. This lifetime 
is stochastic and satisfies P{t^) = c?oe~^°* , the distribution 
expected for any first-order decay process [26]. Proteins 
synthesis is also first order, and the number of proteins, r, 
synthesized by an mRNA during its lifetime satisfies a Pois- 
son process: ^ e~'"^* [26]. Consequently, the probable 
number of proteins synthesized from a particular mRNA is 



P(r) 
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which integrates to Eq. 11. Eq. 11 is equivalent to an 
exponential distribution with a parameter A where A = 
-log(l - [27]. If 6 < 1, then A 6. Exponential 

bursts of protein synthesis have been characterized exper- 
imentally [28, 29]. We note that Eq. 11 has a generating 
function f{z) = (1 + 6 - bz)'^. 

Given that the synthesis and degradation of one mRNA 
generates a burst of r proteins, then the number of proteins 
at steady-state is given by the typical number of mRNAs 
synthesized during a protein lifetime, ^ = a, and the 
for each mRNA. The number of proteins n will be sum of 
these ri. If we assume that there are sufficient ribosomes 
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and charged tRNAs, then translation from each mRNA is 
independent. The generating function of a sum of indepen- 
dent variables is the product of their individual generating 
functions [26]. Consequently, the generating function for 
Pn, F{z), satisfies 



F{z) = l[f{z) = {l + b-bzy 



[131 



which is Eq. 7 when r ^ 1, and so derives Eq. 9. 

By assuming explicitly that protein synthesis occurs in 
bursts, we can derive an effective master equation for gene 
expression that considers only proteins, but implicitly in- 
cludes mRNA fluctuations [19, 30]. We will show that this 
master equation has Eq. 8 as its solution and so is equiv- 
alent to the large 7 approximation to Eq. 1, the master 
equation for both mRNA and protein. If we assume that 
each mRNA synthesized leaves behind a burst of r proteins 
then 



dT 



+(n + l)P„+i - nP„ 



[14] 



where the size of each burst has been determined by Eq. 11 
[30]. Eq. 14 has Eq. 7 as its generating function (Support- 
ing information). By introducing bursts of protein synthe- 
sis, mRNA fluctuations can be absorbed into a one- variable 
master equation provided 7^1. Friedman et at. used a 
continuous version of this approach with an exponential 
burst distribution inspired by their experimental results 
[28, 29]. They derived a gamma distribution for steady- 
state protein numbers [19]. Eq. 9 tends to this distribution 
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6«r(a) 
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for large n (Supporting information). Friedman et al also 
demonstrated that the burst approximation remains valid 
when negative or positive feedback is included [19]. 

In summary, we have shown that exploiting the differ- 
ence between protein and mRNA lifetimes through a large 
value of 7, but flnite a and 6, allows powerful mathematical 
simpliflcations. Large 7 implies that mRNA is at steady- 
state for most of the lifetime of a protein and that the prob- 
ability mass of the joint distribution of protein and mRNA 
is peaked at zero mRNAs, although the mean number of 
mRNAs need not be zero (Fig. 1). The number of proteins 
translated from an mRNA obeys a geometric distribution 
in both the two-stage and three-stage models [25] , but large 
7 implies that the proteins translated from an mRNA all 
appear, on protein timescales, simultaneously so that the 
synthesis and degradation of an mRNA leaves behind a geo- 
metric burst of proteins. If 7 < 1, then proteins synthesized 
from a particular mRNA will be degraded as further pro- 
teins are synthesized, and the distribution describing the 
number of proteins remaining once the mRNA is degraded 
will no longer be geometric. Explicitly including geometric 
bursts accurately describes the effects of mRNA fluctua- 
tions on the distribution of protein numbers when 7^1. 
It allows the model of Fig. la to be described by a one- 
variable master equation: Eq. 14. 

A three-stage model of gene expression. We next consider 
the full three-stage model of gene expression (Fig. 3a). We 



find the protein distribution for this system by taking the 
large 7 limit of the master equation. Let Pm,n be the prob- 
ability of having m mRNAs and n proteins when the DNA 
is inactive and Pm,n be the probability of having m mRNAs 
and n proteins when the DNA is active. We then have two 
coupled equations: 
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where kq = ko/di and ki = ki/di. 

We solve Eqs. 16 and 17 at steady- state by taking the 
large 7 limit of the equivalent equations for their generat- 
ing functions (a generating function is defined for each state 
of the promoter). Our approach is a natural extension of 
the method used to solve the two-stage model (Supporting 
information). We find that 
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where 



^ = - {a -\- kq -\- Ki -\- (I)) 

P = ^ (a + A^O + A^l - 0) 



[19] 
[20] 



and 0^ = (a + A^o + ACi)^ — ^ano. Eq. 18 is valid when 7^1 
and a and b are finite. The mean of this distribution is 
(^) = f^o+ki protein noise, 77, satisfies 

Vd [21] 



2 1 -1 1 
V — TX ~ — ^ + 



(n) (m) di -\- ko -\- k 

where (m) is the mean number of mRNAs, and is inversely 
proportional to 7, and tjd is the noise in the active state of 
DNA: 77!) = ki/ko [4]. As well as a Poisson-like term ex- 
pected for any birth-and-death process, protein noise has 
time-averaged contributions from fluctuations in the num- 
ber of mRNAs and fluctuations in the state of DNA. We 
verify Eq. 18 by simulation in Fig. 3. 

The protein distribution for the three-stage model can 
have similar behavior to the two-stage model of Fig. la, but 
it can also generate a bimodal distribution with a peak both 
at zero and non-zero numbers of molecules (Fig. 3d). This 
bimodality is not a reflection of an underlying bistability, 
but arises from slow transitions driving the DNA between 
active and inactive states [5, 17, 31, 32]. 
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As expected, Eq. 18 recovers the negative binomial dis- 
tribution under certain conditions. It tends to Eq. 9 when 
Ki 0: the DNA is then always active at steady-state. 
When Ki = 0, Eqs. 19 and 20 imply that a = a and f3 — 
and recall that 2i^i(tt, 0, c; — 1 for all a, c, and z. Sim- 
ilarly, when HQ and ni are both large, but hio/hci is fixed, 
then a ^ kq Ki and P . Consequently, 



r(/3 + n) 

r(n + i)r(/3) 



because 2Fi(a,b,a] z) = (1 — z)~^. With fast switching of 
the DNA between active and inactive states, Eq. 18 be- 
comes Eq. 9, but with a replaced by ^^^^^ . 

Discussion. We have shown we can calculate distributions 
for protein numbers by assuming protein lifetimes are 
longer than mRNA lifetimes while a, the number of mR- 
NAs transcribed during a protein lifetime, and 6, the num- 
ber of proteins translated during a mRNA lifetime, are fi- 
nite. Fig. 4a shows the ratio 7 measured for almost 2,000 
genes in budding yeast. Around 80% of the genes have 7 
greater than one and the median value is approximately 3 
(we include the data set in Supporting information). We 
therefore expect our predicted distributions to be widely 
applicable in budding yeast. In bacteria, too, 7 is expected 
to be greater than 1 because mRNA lifetimes are usually 
minutes (they are typically tens of minutes in yeast) and 
protein lifetimes are often determined by the length of the 
cell cycle (typically 30 or more minutes) [33]. 

Values of 7 > 1 reduce protein fluctuations by allow- 
ing more averaging of the underlying mRNA fluctuations 
(Eq. 21). We indeed observe a small, but statistically sig- 
nificant, negative correlation between total noise and 7 us- 
ing the data of Newman et al [34] (a rank correlation of 
—0.2 with a P value of 10~^). In Fig. 4b, we have calcu- 
lated the median 7 for yeast genes in different gene ontology 
classes. All classes have a median 7 > 1. Proteins involved 
in transferring nucleotidyl groups, which include RNA and 
DNA polymerases, have high median 7 > 5, presumably 
because high stochasticity in these proteins can undermine 
many cellular processes. Similarly, proteins that contribute 
to the structural integrity of protein complexes have a me- 
dian 7 > 5. Large fluctuations can vastly reduce the eflfi- 
ciency of complex assembly by preventing complete com- 
plexes forming because of a shortage of one or more com- 
ponents [11, 35]. Perhaps surprisingly transcription factors 
have a low median 7 > 1. Although low 7 does increase 
stochasticity, it can allow quick response times if the pro- 
tein degradation rate is high. A high protein degradation 
rate may also keep numbers of transcription factors low to 
reduce deleterious non-speciflc chromosomal binding. 

We show that protein synthesis occurs in bursts in both 
the two- and the three-stage model when 7 1. Such 
bursts of gene expression have been measured in bacteria 
and eukaryotes [12, 13, 14, 28, 29]. They allow mRNA 
to be replaced in the master equation by a geometric dis- 
tribution for protein synthesis for all times greater than 
several mRNA lifetimes if their source is translation and 
the protein lifetime is substantially longer than the mRNA 
lifetime. Such an approach has already been proposed [19], 
but without determining its validity. Similarly, if mRNA 
fluctuations are negligible, the master equation reduces to 
one variable (protein) , and describing the protein distribu- 
tion becomes substantially easier [31, 36]. 

An important problem in systems biology is to deter- 
mine which properties of biochemical networks and the in- 



tracellular environment must be modeled to make accurate, 
quantitative predictions. As well as obscuring the process 
driving the observed phenotype, models more complex than 
needed are harder to correctly parameterize and to simulate 
to generate predictions. Our results show that complexity, 
here two states of the promoter, can be modelled by ef- 
fective parameters that under certain conditions will give 
accurate predictions of the entire distribution of protein 
numbers: Eq. 22. Alternatively, they show that not only 
the mean and variance [37] but also the protein distribu- 
tion may not have enough information to determine the 
biochemical mechanism generating gene expression from 
measurements of protein levels: Eqs. 9 and 22. Such ef- 
fects are likely to be compounded by non-steady-state dy- 
namics (Fig. 2) and extrinsic fluctuations. Collecting data 
on the corresponding mRNA distribution may disfavor the 
two-stage over the three-stage model because mRNA distri- 
butions in the three-stage model can have two peaks even 
though the protein distribution has only one [6]. In gen- 
eral, though, time series measurements, preferably with and 
without perturbations, may provide the most discrimina- 
tive power [37]. 

Experimental measurements are best compared with 
the predicted distribution rather than its mean, standard 
deviation, or mode. Both the protein and mRNA distri- 
butions are typically not symmetric and may not be uni- 
modal. Consequently, the mean and the mode can be sig- 
nificantly different, and the standard deviation can be a 
poor measure of the width of the distribution at half max- 
imum [38]. Such distributions are poorly characterized by 
the commonly used coeflficient of variation because they 
are not locally Gaussian around their mean (Figs. 1-3). 
In addition, fltting moments to find model parameters can 
be challenging. Moments, more so than distributions, are 
functions of combinations of parameters and can also be 
badly estimated without large amounts of data, particu- 
larly for asymmetric distributions. We therefore believe a 
Bayesian or maximum likelihood approach is most suitable 
where the experimental protocol is replicated by the fit- 
ting procedure and explicitly accounts for the shape of the 
distribution and the number of measurements. For exam- 
ple, irrespective of how many measurements are available, 
the likelihood of the data for a particular set of parame- 
ters can always be determined from the assumed distribu- 
tion of protein numbers. Our analytical expressions will 
greatly speed-up such approaches by avoiding large num- 
bers of simulations and by aiding in deconvolving extrinsic 
fluctuations which can substantially change the shape of 
protein distributions [8]. 

Our results should also allow more general fluctuation 
analyzes of gene expression data. Such analyzes convert 
fluorescence measurements into absolute units (numbers of 
molecules) by exploiting that the magnitude of fluctuations 
is determined by the number of molecules independently of 
how those numbers are measured [39] . Converting into ab- 
solute units is essential if information from different experi- 
ments is to be combined into a larger, predictive framework, 
a goal of systems biology. 

More generally, our approach is an example of a tech- 
nique to simplify the dynamics of a stochastic system by ex- 
ploiting differences in timescales. We remove a fast stochas- 
tic variable through replacing a constant parameter (the 
parameter a) by a time-dependent parameter (the burst 
distribution) whose variation captures the effects of fluctu- 
ations in the fast variable on the dynamics of the slow one 
[40]. 
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Fig. 1. Predictions and simulations for a two-stage model of gene expres- 
sion, a Both transcription and translation are modelled as first-order processes: 
transcription occurs with a probability vq per unit time and translation with a 
probability of Vi per unit time. Degradation of mRNA and protein are also both 
first-order processes: mRNA degrades with a probability do per unit time and 
protein degrades with a probability di per unit time, b and c A comparison of 
Eq. 9, shown as the distribution in green, and stochastic simulations for large and 
small 7. Protein distributions can be either peaked or have a maximum only at 
n = [19]. The mean number of mRNAs, a/7, is either 2 or 20 in b and either 
0.05 or 0.5 in c. d The accuracy of Eq. 9 improves with larger 7. The Kullback- 
Leibler divergence between the analytical and simulated protein distributions is 
plotted as a function of 7. For 7 greater than 1, the distributions become almost 
indistinguishable. 
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Fig. 2. Predictions for the time-dependent solution of the two-stage model of 
gene expression, a, b The distribution of protein numbers at different times with 
time increasing in the direction of the arrow. Parameters in a correspond to Fig. 
lb. There are zero proteins initially. Parameters in b correspond to Fig. Ic. There 
are 50 proteins initially, c The Kullback-Leibler divergence for the distributions 
of a and b. The divergence decreases as r = tdi grows above 7""*". It is 
small for small times because both the simulations and the calculations start from 
the same initial distribution, d Noise in protein numbers as a function of time. 
Initially, proteins have a negative binomial distribution chosen to have a particular 
magnitude of noise. The noise at steady-state is shown by a dashed line, e The 
calculated distributions for the first time protein levels reach a given threshold, 
N, if initially there are zero proteins. These distributions are qualitative with the 
probability typically underestimated for small tdi. They obey a renewal equation 
[26], which we solve numerically. 
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Fig. 3. Predictions and simulations for a three-stage model of gene expression, 
a The region of the DNA containing the promoter region transitions between 
inactive and active forms with probabilities per unit time of ko and ki. As an 
example, we show the TATA-box binding protein driving the transition, b, c, 
and d A comparison of Eq. 18, shown as the distribution in green, and stochastic 
simulations for large and small 7. The mean number of mRNAs, n^^?i — r. is 
either 3 or 30 in b, 0.075 or 0.75 in c, and either 0.3 or 3 in d. 
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Fig. 4. The ratio of the protein to mRNA lifetime, 7, for 1,962 genes in bud- 
ding yeast, a Most proteins have 7 > 1. Protein lifetimes are from Belle et 
al. [41] and mRNA lifetimes are from Grigull et al. (circles) [42] or from Wang 
et al. (squares) [43]. The median of 7 is 2:^ 3 (shown by a dashed line), while 
its mean is greater than 10 (although this value is probably erroneously high be- 
cause of outliers). Overall, we found little correlation between mRNA and protein 
lifetimes, b The median value of 7 for genes in different gene ontology classes. 
We plot the mean of the medians for the two datasets. Errors in the medians are 
approximately 25% (using 1000 bootstrap samples for each gene ontology class). 
Gene annotations are from the Saccharomyces cerevisiae genome database 
(www . yeastgeonome . org). 
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